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The crystallographic structure of the Mycobacterium tuberculosis (TB) protein 
Rv3902c (176 residues; molecular mass of 19.8 kDa) was determined at 1.55 A 
resolution. The function of Rv3902c is unknown, although several TB genes 
involved in bacterial pathogenesis are expressed from the operon containing 
the Rv3902c gene. The unique structural fold of Rv3902c contains two domains, 
each consisting of antiparallel y^-sheets and a-helices, creating a hand-like 
binding motif with a small binding pocket in the palm. Structural homology 
searches reveal that Rv3902c has an overall structure similar to that of the 
Salmonella virulence-factor chaperone InvB, with an r.m.s.d. for main-chain 
atoms of 2.3 A along an aligned domain. 

1. Introduction 

Mycobacterium tuberculosis (TB) is an aerobic acid-fast Gram- 
positive bacterium. A waxy coating on the cell surface of the TB 
organism makes it exceedingly resistant to host defense mechanisms, 
particularly when present in the alveoli of lung tissue. Despite the 
wide use of antibiotics and attenuated vaccines, TB is one of the 
leading causes of death from bacterial infections, killing an estimated 
1.3 million people each year (World Health Organization, 2012). 
Today, the occurrence of multidrug-resistant tuberculosis (MDR-TB) 
and extensively drug-resistant tuberculosis (XRD-TB) is becoming 
more prevalent in both developing and industrialized nations, and has 
been reported in 84 countries (World Health Organization, 2012). 
There is a vital need for new vaccines and small-molecule ther- 
apeutics to combat these drug-resistant strains of TB. In 1998, 
researchers sequenced the complete genome of TB and revealed 
approximately 4000 genes (Cole et al., 1998), which was followed by 
the establishment of the TB Structural Genomics Consortium under 
the NIH Protein Structure Initiative in 2000. The TB Structural 
Genomics Consortium is an effort to expedite the structure deter- 
mination of all TB proteins in an effort to further the understanding 
of TB biology (Chim et al, 2011). It is hoped that this structural 
knowledge will provide a basis for the identification of new protein 
targets and drug-discovery strategies with which to treat infections 
caused by TB. 

Rv3902c is a TB protein with a molecular weight of 19.8 kDa 
comprised of 176 amino acids expressed on the same operon as EsxF, 
EsxE and Rv3903c (http://www.tbdb.org). The function of Rv3902c is 
unknown; however, the products of two genes transcribed along with 
Rv3902c, EsxE and EsxF, are paralogs to early secreted antigenic 
target 6 (Esat-6) proteins found in TB (Agarwal et al, 2007). Esat-6 
proteins are potent T-cell antigens and play a role in TB pathogenesis 
(Smith, 2003). Since Rv3902c is expressed on the same operon as 
Esat-6 proteins, this suggests that Rv3902c may play a role in TB 
virulence. Here, we present the crystal structure of the TB protein 
Rv3902c. 



2. Materials and methods 

2.1. Protein expression and purification 

Rv3902c was subcloned into a pVP16 vector consisting of an 
N-terminal fusion with a 6 x His tag, maltose-binding protein (MB?) 
and a Tobacco etch virus protease (TEV) cleavage site via the 
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Table 1 

Summary of Rv3902c crystallographic data. 

Values in parentheses are for the highest resolution shell. 

Native 



Br-soaked 



Experiment 
Wavelength (A) 
Unit-cell parameters (A) 
Space group 
Resolution (A) 
No. of unique reflections 
Completeness (%) 
Multiplicity 
Mean Ila{I) 

Molecules in asymmetric unit 
Matthews coefficient (A^ Da~^) 
Solvent content (%) 

-^merge 
^meas 

CCi/2 

Anomalous completeness (%) 
Anomalous multiphcity 
Refinement statistics 

^work/^free (%) 

No. of atoms 
Protein 
Water 

Heavy atoms 
R.m.s. deviations 

Bond lengths (A) 

Bond angles (°) 
Average B factor (A^) 

Protein 

Water 



54.23 



Native 
1.0000 

a^b^ 91.81 
P6i 

39.76-1.55 (1.58-1.55) 

37938 (1875) 

100 (100) 

14.7 (11.5) 

18.2 (2.1) 

1 

3.32 
62.90 

0.087 (1.373) 
0.094 (1.510) 
0.034 (0.617) 
0.999 (0.609) 



16.5/17.7 



1595 
221 



0.006 
0.98 



22.1 
37.3 



SAD 
0.9198 

a = = 92.08, c = 54.03 
P6i 

32.08-1.62 (1.65-1.62) 

33300 (1646) 

100 (100) 

7.5 (6.5) 

11.8 (2.0) 

1 

3.32 
62.98 

0.096 (0.972) 
0.110 (1.146) 
0.054 (0.603) 
0.997 (0.597) 
91.7 (88.8) 
3.5 (2.9) 



18 



Ramachandran plot statistic, residues in 

Most favored regions 144 

Additionally allowed regions 8 

Generously allowed regions 0 

Disallowed regions 0 

MolProbity statistics 

Score 1.08 

Clashscore 2.95 

Poor rotamers (%) 0 



Gateway cloning method (Invitrogen). It should be noted that owing 
to the TEV construct used, an additional serine remains at the 
N-terminus of the protain after TEV cleavage. The resulting 
construct N-His8-MBP-linker-TEV+Rv3902c/pVP16 was trans- 
formed into Escherichia coli BL21(DE3) cells. A seed culture was 
created via the selection of a single ampicillin-resistant colony and 
inoculation into 25 ml Luria broth (LB) containing 100 )ig ml~^ 
ampicillin (cultured overnight in an incubator shaker set at 
250 rev min"^ and 37°C). On the following day, a 1:100 dilution of the 
seed culture was placed into two 1 1 LB flasks each containing 
100 |ig ml~^ ampicillin and grown in an incubator/shaker set at 
250revmin"^ and 37° C for --4 h. When an ODeoo of --0.6 was 
reached, the flasks were removed from the incubator, cooled on ice to 
16° C, and isopropyl ^-D-l-thiogalactopyranoside (IPTG) was added 
to a final concentration of 0.5 mM to induce expression. Induced 
cultures were allowed to grow overnight in an incubator/shaker set at 
250 rev min~^ and 16° C. 

The following morning, the incubated cells were centrifuged at 
8000g and resuspended in 70 ml buffer A (20 mM Tris-HCl pH 8.0, 
200 mM KCl, 5% glycerol, 1.4 mM ^-mercaptoethanol) with Ix 
complete Protease Inhibitor (Roche). All subsequent purification 
steps were performed either on ice or at 4°C. The cells were sonicated 
and the cellular debris was spun down at 25 OOOg for 30 min. The 
supernatant was filtered using a 0.45 \\m filter, supplemented with 
imidazole to a final concentration of 20 mM and batch bound to 5 ml 
Ni-NTA Superflow beads (Qiagen) pre-equilibrated with buffer A 
and rocked for 1 h. The Ni-NTA beads were packed into a column 



and attached to an AKTA FPLC system (GE Healthcare). The Ni- 
NTA column was washed with five column volumes (CV) of 95% 
buffer A and 5 % buffer B (buffer A plus 1 M imidazole) and eluted at 
2 ml min~^ in a gradient from 5 to 100% buffer B over 10 CV. The 
purity of the peak fractions was analysed via SDS-PAGE followed by 
concentration of the peak fractions to a final volume of 5 ml with 
Amicon 10 000 NMWL (Millipore) spin concentrators. TEV protease 
was added to the pooled and concentrated fractions in a 1:100 
(protease:protein) ratio to cleave the His-tagged MBP followed by a 
gentle overnight rocking. The sample was filtered with a 0.45 \\m filter 
and applied onto a Superdex 75 size-exclusion column (26 mm 
diameter x 60 cm length; flow rate of 2.5 ml min~^) attached to an 
AKTA FPLC system pre-equilibrated with buffer A. The purity of 
the peak fractions was analyzed via SDS-PAGE. Since further puri- 
fication was required to remove the His-tagged MBP, the peak frac- 
tions were applied onto a Ni-NTA column pre-equilibrated with 
buffer A. Because the purified and cleaved Rv3902c did not bind to 
the Ni-NTA column, the flowthrough was collected, concentrated to 
25 mg ml~^ and then analysed via SDS-PAGE. The purified protein 
was snap-frozen in liquid nitrogen and stored at — 80°C. 

2.2. Crystallization, data collection and structure determination 

Nanocrystallization trials of purified Rv3902c were conducted with 
kits from Hampton Research, Emerald Bio and Qiagen. Each trial 
utilized a Gryphon (Art Robbins Instruments) nanodispensing robot 
with 200 nl drops (1:1 protein:well solution volume ratio) in 96-well 
sitting-drop plates. Several hits were found and optimized in 1 )il 
vapour-diffusion hanging drops (1:1 protein:well solution volume 
ratio), with the final optimized crystallization condition being 1.5 M 
ammonium sulfate, 200 mM sodium cacodylate pH 6.5. Clusters of 
hexagonal crystals often appeared 16-48 h later at room temperature. 
The crystals were manually separated with the Hampton Research 
Micro-Tools set, with some of the larger crystals reaching 0.2 x 0.2 x 
0.5 mm. 

Crystals were cryoprotected with a final concentration of 25% 
glycerol followed by flash-cooling in a nitrogen cryo-stream. Phases 
were obtained by soaking some of the crystals in cryosolution 
supplemented with 1 M sodium bromide for 30-60 s and immediately 
flash-cooling them. Diffraction data were collected on the SER-CAT 
22-BM beamline at the Advanced Photon Source, Argonne National 
Laboratory. The bromide-soaked crystals were exposed to X-ray 
radiation (wavelength tuned to the bromide absorption edge at 
0.9198 A). A single-wavelength anomalous difference (SAD) data set 
was then collected to 1.62 A resolution. Additionally, a native data set 
was collected at a wavelength of 1 A to a resolution of 1.55 A. Both 
data sets were indexed and scaled with iMosftm (Battye et al., 2011) 
and AIMLESS (Winn et al., 2011), respectively, and resolution limits 
were determined using the CC1/2 criterion (Karplus & Diederichs, 
2012). The space group was determined to be P6i using POINTLESS 
(Evans, 2006, 2011). 

With one subunit in the asymmetric unit, the Matthews coefficient 
(Vm) was calculated to be 3.32 A^ Da~^, with an estimated solvent 
content of 63% (Kantardjieff & Rupp, 2003). The Br-atom positions, 
phasing and initial model were determined using the Auto Sol module 
of PHENIX (Adams et al, 2010). The model was further refined 
utilizing the native data set with iterative rounds of PHENIX auto- 
mated refinement and manual refinement using Coot (Emsley & 
Cowtan, 2004). Residues 175-178 were not traced owing to missing 
electron density. Ramachandran plots revealed 98.3% of residues in 
favored regions and 1.7% in additionally allowed regions (Adams et 
al, 2010). All structural figures, including the electrostatic map, were 
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produced using PyMOL (http://www.pymol.org). The topology 
diagram was constructed with Pro-origami (Stivala et al, 2011), with 
additional modifications made with Inkscape (http://inkscape.org/). 
Atomic coordinates and structure factors have been deposited in the 
Protein Data Bank (Berman et al, 2004) with accession code 4o6g. 
Crystallographic data are summarized in Table 1. 



3. Results and discussion 

3.1. General features of the Rv3902c structure 

Initial phases were determined via bromine-soaked crystals 
utilizing the single-wavelength anomalous dispersion (SAD) method 
(Dauter et al., 2000). An additional high-resolution native crystal data 
set was collected to 1.55 A resolution. The final structure consists of 
174 residues and 216 waters (Fig. la). Electron density was not 
observed for residues 175-178. The core of Rv3902c consists of two 
main structural domains. The first domain is composed of two anti- 



parallel ^-sheets containing ^-strands 1-5 and 7-9 (Fig. lb) as well as 
two ^-a-^ motifs with 3io-helices (B and C) positioned between 
jS-strands 1 and 2 and between ^-strands 2 and 3, and a third 3io helix 
(E) immediately preceding ^-strand 7. The second domain consists of 
Qf-helices A, D, Fand G and two short antiparallel yS-strands 6 and 10. 
A feature of notable functional interest is the creation of a hydro- 
phobic pocket with an acidic entrance between a-helices D and F that 
is ^1 A in diameter and ^1 A deep (Fig. 2). The interior surface of 
this pocket is lined with the side chains of residues TyrSO, Tyr84, 
Leul45, Tyrl48, Argl41 (the aliphatic portion) and Ilel56 as well as 
the main chains of residues Lys79 and Gly83. The carboxylic acid 
moiety of Glul44 and the hydroxyl of Tyrl48 are major acid charge 
contributors located at the entrance to the pocket. This small pocket 
is located in the center (palm) of a hand-like binding motif, with the 
bottom of the palm and thumbs made of the antiparallel ^-strands 
1-5 and 3io-helices B and C, and with the fingers made up of a-helices 
D and F (Figs, la and 2). The surface of Rv3902c is highly charged, 
with an estimated pi of 4.77 (Gasteiger et al., 2005). 



Figure 1 

(a) Stereoview of 
green; 3io-helices, 




the crystal structure of Rv3902c. (b) Secondary-structure diagram of Rv3902c. The structure is colored according to secondary-structure elements: loops, 
cyan; a-helices, red; yS-strands, yellow. Each yS-strand (1-10) is numbered and each helix is lettered (A-G). 
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Figure 2 

Surface electrostatic potential map of Rv3902c generated by PyMOL, with basic 
and acidic regions in blue and red, respectively. The views differ by a 90° rotation 
about the vertical axis. 




Figure 3 

Alignment of Rv3902c (red) with InvB (green). The regions that are not 
transparent are the ahgned homologous domains consisting of antiparallel 
yS-strands 2-5 of Rv3902c. 



3.2. Homology search and analysis 

Three-dimensional structural homology searches utilizing the 
iPBA web server (Gelly et al, 2011) did not identify structures with 
reasonable homology to Rv3902c. However, a type III virulence- 
factor chaperone, InvB, from Salmonella exhibited an r.m.s.d. of 2.3 A 
along the aligned homologous domains comprising 24.2% of the 
main-chain structure (60 residues) of Rv3902c. Virulence-factor 
chaperones vary greatly in structural homology and sequence simi- 
larity and are typically small acidic proteins without an ATP-binding 
site or hydrolytic function that are involved in the secretion and 
translocation of bacterial virulence proteins (Lilic et al, 2006). 
Rv3902c has some of the physical characteristics of these chaperones. 
Fig. 3 aligns the structures of Rv3902c and InvB along the aligned 



homologous domains. The aligned domain consists of the antiparallel 
yS-strands 2-5 of Rv3902c, while much of the a-helical domains are 
not structurally homologous. If Rv3902c is a virulence-factor 
chaperone, it would represent the first known TB virulence-factor 
chaperone to be crystallized. Several other virulence-factor chaper- 
ones have been crystallized in other species and these virulence- 
factor chaperones are often in complex with their respective viru- 
lence factors (Lilic et al., 2006; Phan et al, 2004). The only other low- 
homology match to Rv3902c suggested by iPBA was a biotin protein 
ligase with a score of 3.2 A along aligned regions comprising 10.5% of 
the structure of Rv3902c (Gelly et al, 2011). An additional homology 
search was performed using DALI (Holm & Rosenstrom, 2010), but 
did not yield any clear matches among aligned domains. Finally, a 
functional search for Rv3902c conducted using the ProFunc server 
(Laskowski et al, 2005) yielded no significant hits. We conclude that it 
is possible that there is a set of secreted virulence factors including 
EsxF, EsxE and Rv3903c, and that the Rv3902c protein may be a 
chaperone involved in the secretion of these proteins. 
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